Goto

Collaborating Authors

 explicit construction


Efficient Neural SDE Training using Wiener-Space Cubature

Snow, Luke, Krishnamurthy, Vikram

arXiv.org Artificial Intelligence

A neural stochastic differential equation (SDE) is an SDE with drift and diffusion terms parametrized by neural networks. The training procedure for neural SDEs consists of optimizing the SDE vector field (neural network) parameters to minimize the expected value of an objective functional on infinite-dimensional path-space. Existing training techniques focus on methods to efficiently compute path-wise gradients of the objective functional with respect to these parameters, then pair this with Monte-Carlo simulation to estimate the expectation, and stochastic gradient descent to optimize. In this work we introduce a novel training technique which bypasses and improves upon Monte-Carlo simulation; we extend results in the theory of Wiener-space cubature to approximate the expected objective functional by a weighted sum of deterministic ODE solutions. This allows us to compute gradients by efficient ODE adjoint methods. Furthermore, we exploit a high-order recombination scheme to drastically reduce the number of ODE solutions necessary to achieve a reasonable approximation. We show that this Wiener-space cubature approach can surpass the O(1/sqrt(n)) rate of Monte-Carlo simulation, or the O(log(n)/n) rate of quasi-Monte-Carlo, to achieve a O(1/n) rate under reasonable assumptions.


Explicit construction of recurrent neural networks effectively approximating discrete dynamical systems

Nakayama, Chikara, Yoneda, Tsuyoshi

arXiv.org Artificial Intelligence

We consider arbitrary bounded discrete time series originating from dynamical system with recursivity. More precisely, we provide an explicit construction of recurrent neural networks which effectively approximate the corresponding discrete dynamical systems.


Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers

Behrens, Freya, Biggio, Luca, Zdeborová, Lenka

arXiv.org Artificial Intelligence

We provide a comprehensive analysis of simple transformer models trained on the histogram task, where the goal is to count the occurrences of each item in the input sequence from a fixed alphabet. Despite its apparent simplicity, this task exhibits a rich phenomenology that allows us to characterize how different architectural components contribute towards the emergence of distinct algorithmic solutions. In particular, we showcase the existence of two qualitatively different mechanisms that implement a solution, relation- and inventory-based counting. Which solution a model can implement depends non-trivially on the precise choice of the attention mechanism, activation function, memorization capacity and the presence of a beginning-of-sequence token. By introspecting learned models on the counting task, we find evidence for the formation of both mechanisms. From a broader perspective, our analysis offers a framework to understand how the interaction of different architectural components of transformer models shapes diverse algorithmic solutions and approximations.


An explicit construction of Kaleidocycles

Kaji, Shizuo, Kajiwara, Kenji, Shigetomi, Shota

arXiv.org Artificial Intelligence

We model a family of closed kinematic chains, known as Kaleidocycles, with the theory of discrete spatial curves. By leveraging the connection between the deformation of discrete curves and the semi-discrete integrable systems, we describe the motion of a Kaleidocycle by elliptic theta functions. This study showcases an interesting example in which an integrable system generates an orbit in the space of the real solutions of polynomial equations defined by geometric constraints.